Math is not just a way of calculating numerical answers; it is a way of thinking, using clear definitions for concepts and rigorous logic to organize our thoughts and back up our assertions.
Cheng (2025)
These lecture notes use:
Some key results are listed here.
Theorem 1 (Equalities are transitive) If \(a=b\) and \(b=c\), then \(a=c\)
Theorem 2 (Substituting equivalent expressions) If \(a = b\), then for any function \(f(x)\), \(f(a) = f(b)\)
Theorem 3 If \(a<b\), then \(a+c < b+c\)
Theorem 4 (negating both sides of an inequality) If \(a < b\), then: \(-a > -b\)
Theorem 5 If \(a < b\) and \(c \geq 0\), then \(ca < cb\).
Theorem 6 \[-a = (-1)*a\]
Theorem 7 (adding zero changes nothing) \[a+0=a\]
Theorem 8 (Sums are symmetric) \[a+b = b+a\]
Theorem 9 (Sums are associative)
\[(a + b) + c = a + (b + c)\]
Theorem 10 (Multiplying by 1 changes nothing) \[a \times 1 = a\]
Theorem 11 (Products are symmetric) \[a \times b = b \times a\]
Theorem 12 (Products are associative) \[(a \times b) \times c = a \times (b \times c)\]
Theorem 13 (Division can be written as a product) \[\frac {a}{b} = a \times \frac{1}{b}\]
Theorem 14 (Multiplication is distributive) \[a(b+c) = ab + ac\]
Definition 1 (Quotients, fractions, rates)
\[\frac{a}{b}\]
Definition 2 (Ratios) A ratio is a quotient in which the numerator and denominator are measured using the same unit scales.
Definition 3 (Proportion) In statistics, a “proportion” typically means a ratio where the numerator represents a subset of the denominator.
Definition 4 (Proportional) Two functions \(f(x)\) and \(g(x)\) are proportional if their ratio \(\frac{f(x)}{g(x)}\) does not depend on \(x\). (c.f. https://en.wikipedia.org/wiki/Proportionality_(mathematics))
Additional reference for elementary algebra: https://en.wikipedia.org/wiki/Population_proportion#Mathematical_definition
Theorem 15 (logarithm of a product is the sum of the logs of the factors) \[ \log{a\cdot b} = \log{a} + \log{b} \]
Corollary 1 (logarithm of a quotient)
\[\log{\frac{a}{b}} = \log{a} - \log{b}\]
Theorem 16 (logarithm of an exponential function) \[ \text{log}{\left\{a^b\right\}} = b \cdot\text{log}{\left\{a\right\}} \]
Theorem 17 (exponential of a sum)
\[\text{exp}{\left\{a+b\right\}} = \text{exp}{\left\{a\right\}} \cdot\text{exp}{\left\{b\right\}}\]
Corollary 2 (exponential of a difference)
\[\text{exp}{\left\{a-b\right\}} = \frac{\text{exp}{\left\{a\right\}}}{\text{exp}{\left\{b\right\}}}\]
Theorem 18 (exponential of a product) \[a^{bc} = {\left(a^b\right)}^c = {\left(a^c\right)}^b\]
Corollary 3 (natural exponential of a product) \[\text{exp}{\left\{ab\right\}} = (\text{exp}{\left\{a\right\}})^b = (\text{exp}{\left\{b\right\}})^a\]
Exercise 1 For \(a \ge 0,~b,c \in \mathbb{R}\), When does \((a^b)^c = a^{(b^c)}\)?
Solution 1. Short answer: rarely (that’s all you need to know for this course).
Long answer:
If \((a^b)^c = a^{(b^c)}\), then since \((a^b)^c = a^{bc}\), we have: \[a^{bc} = a^{(b^c)}\] \[\text{log}{\left\{a^{bc}\right\}} = \text{log}{\left\{a^{(b^c)}\right\}}\] \[bc \cdot \text{log}{\left\{a\right\}} = b^c\cdot \text{log}{\left\{a\right\}} \tag{1}\]
Equation 1 holds in each of the following cases:
In particular, when \(a=0\) and \(c=0\), \(bc = 0\) and \(b^c = 1\) (for any \(b \in \mathbb{R}\)), so \(\text{sign}{\left\{bc\right\}}\neq \text{sign}{\left\{b^c\right\}}\), and \((a^b)^c \neq a^{(b^c)}\):
\[ \begin{aligned} (a^b)^c &= (0^b)^0 \\ &= 1 \end{aligned} \]
\[ \begin{aligned} a^{(b^c)} &= 0^{(b^0)} \\ &= 0^1 \\ &= 0 \end{aligned} \]
Exercise 2 For \(b,c \in \mathbb{R}\), when does \(b^c = bc\)?
Solution 2. \(bc = b^c\) in each of the following cases:
See the red contours in Figure 2 for a visualization.
`b*c_f` <- function(b, c) b*c
`b^c_f` <- function(b, c) b^c
values_b <- seq(0, 5, by = .01)
values_c <- seq(-.5, 3, by = .01)
`b*c` <- outer(values_b, values_c, `b*c_f`)
`b^c` <- outer(values_b, values_c, `b^c_f`)
`b^c`[is.infinite(`b^c`)] = NA
opacity <- .3
z_min <- min(`b*c`, `b^c`, na.rm = TRUE)
z_max <- 5
plotly::plot_ly(
x = ~values_b,
y = ~values_c
) |>
plotly::add_surface(
z = ~ t(`b*c`),
contours = list(
z = list(
show = TRUE,
start = -1,
end = 1,
size = .1
)
),
name = "b*c",
showscale = FALSE,
opacity = opacity,
colorscale = list(c(0, 1), c("green", "green"))
) |>
plotly::add_surface(
opacity = opacity,
colorscale = list(c(0, 1), c("red", "red")),
z = ~ t(`b^c`),
contours = list(
z = list(
show = TRUE,
start = z_min,
end = z_max,
size = .2
)
),
showscale = FALSE,
name = "b^c"
) |>
plotly::layout(
scene = list(
xaxis = list(
# type = "log",
title = "b"
),
yaxis = list(
# type = "log",
title = "c"
),
zaxis = list(
# type = "log",
range = c(z_min, z_max),
title = "outcome"
),
camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
aspectratio = list(x = .9, y = .8, z = 0.7)
)
)`b^c - b*c_f` <- function(b, c) `b^c_f`(b,c) - `b*c_f`(b,c)
mat1 <- outer(values_b, values_c, `b^c - b*c_f`)
mat1[is.infinite(mat1)] = NA
opacity <- .3
plotly::plot_ly(
x = ~values_b,
y = ~values_c
) |>
plotly::add_surface(
z = ~ t(mat1),
contours = list(
z = list(
show = TRUE,
start = 0,
end = 1,
size = 1,
color = "red"
)
),
name = "b^c - b*c",
showscale = TRUE,
opacity = opacity
) |>
plotly::layout(
scene = list(
xaxis = list(
# type = "log",
title = "b"
),
yaxis = list(
# type = "log",
title = "c"
),
zaxis = list(
title = "outcome"
),
camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
aspectratio = list(x = .9, y = .8, z = 0.7)
)
)Theorem 19 (\(\text{exp}{\left\{\right\}}\) and \(\text{log}{\left\{\right\}}\) are mutual inverses) \[\text{exp}{\left\{\text{log}{\left\{a\right\}}\right\}} = \text{log}{\left\{\text{exp}{\left\{a\right\}}\right\}} = a\]
Theorem 20 (Constant rule) \[\frac{\partial}{\partial x}c = 0\]
Theorem 21 (Power rule) If \(a\) is constant with respect to \(x\), then: \[\frac{\partial}{\partial x}ay = a \frac{\partial x}{\partial y}\]
Theorem 22 (Power rule) \[\frac{\partial}{\partial x}x^q = qx^{q-1}\]
Theorem 23 (Derivative of natural logarithm) \[\text{log}'{\left\{x\right\}} = \frac{1}{x} = x^{-1}\]
Theorem 24 (derivative of exponential) \[\text{exp}'{\left\{x\right\}} = \text{exp}{\left\{x\right\}}\]
Theorem 25 (Product rule) \[(ab)' = ab' + ba'\]
Theorem 26 (Quotient rule) \[(a/b)' = a'/b - (a/b^2)b'\]
Theorem 27 (Chain rule) \[\begin{aligned} \frac{\partial a}{\partial c} &= \frac{\partial a}{\partial b} \frac{\partial b}{\partial c} \\ &= \frac{\partial b}{\partial c} \frac{\partial a}{\partial b} \end{aligned} \]
or in Euler/Lagrange notation:
\[(f(g(x)))' = g'(x) f'(g(x))\]
Corollary 4 (Chain rule for logarithms) \[ \frac{\partial}{\partial x}\log{f(x)} = \frac{f'(x)}{f(x)} \]
Proof. Apply Theorem 27 and Theorem 23.
Definition 5 (Dot product/linear combination/inner product) For any two real-valued vectors \(\tilde{x}= (x_1, \ldots, x_n)\) and \(\tilde{y}= (y_1, \ldots, y_n)\), the dot-product, linear combination, or inner product of \(\tilde{x}\) and \(\tilde{y}\) is:
\[\tilde{x}\cdot \tilde{y}= \tilde{x}^{\top} \tilde{y}\stackrel{\text{def}}{=}\sum_{i=1}^nx_i y_i\]
Theorem 28 (Dot product is symmetric) The dot product is symmetric:
\[\tilde{x}\cdot \tilde{y}= \tilde{y}\cdot \tilde{x}\]
Proof. Apply:
(adapted from Fieller (2016), §7.2)
Let \(\tilde{x}\) and \(\tilde{\beta}\) be vectors of length \(p\), or in other words, matrices of length \(p \times 1\):
\[ \tilde{x}= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ \]
\[ \tilde{\beta}= \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \]
Definition 6 (Transpose) The transpose of a row vector is the column vector with the same sequence of entries:
\[ \tilde{x}' \equiv \tilde{x}^\top \equiv [x_1, x_2, ..., x_p] \]
Example 1 (Dot product as matrix multiplication) \[ \begin{aligned} \tilde{x}\cdot \tilde{\beta} &= \tilde{x}^{\top} \tilde{\beta} \\ &= [x_1, x_2, ..., x_p] \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \\ &= x_1\beta_1+x_2\beta_2 +...+x_p \beta_p \end{aligned} \]
Theorem 29 (Transpose of a sum) \[(\tilde{x}+\tilde{y})^{\top} = \tilde{x}^{\top} + \tilde{y}^{\top}\]
Definition 7 (Vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:
\[ \frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) \\ \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) \\ \vdots \\ \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]
Definition 8 (Row-vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:
\[ \frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) & \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) & \cdots & \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]
Theorem 30 (Row and column derivatives are transposes) \[\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta})\right)}^{\top}\]
\[\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta})\right)}^{\top}\]
Theorem 31 (Derivative of a dot product) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{x}\cdot \tilde{\beta}= \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}\cdot \tilde{x}= \tilde{x} \]
Proof. \[ \begin{aligned} \frac{\partial}{\partial \beta} (x^{\top}\beta) &= \begin{bmatrix} \frac{\partial}{\partial \beta_1}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \frac{\partial}{\partial \beta_2}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \vdots \\ \frac{\partial}{\partial \beta_p}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \end{bmatrix} \\ &= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ &= \tilde{x} \end{aligned} \]
Definition 9 (Quadratic form) A quadratic form is a mathematical expression with the structure
\[\tilde{x}^{\top} \mathbf{S} \tilde{x}\]
where \(\tilde{x}\) is a vector and \(\mathbf{S}\) is a matrix with compatible dimensions for vector-matrix multiplication.
Theorem 32 (Derivative of a quadratic form) If \(S\) is a \(p\times p\) matrix that is constant with respect to \(\beta\), then:
\[ \frac{\partial}{\partial \beta} \beta'S\beta = 2S\beta \]
Corollary 5 (Derivative of a simple quadratic form) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}'\tilde{\beta}= 2\tilde{\beta} \]
Theorem 33 (Vector chain rule) \[\frac{\partial z}{\partial \tilde{x}} = \frac{\partial y}{\partial \tilde{x}} \frac{\partial z}{\partial y}\]
or in Euler/Lagrange notation:
\[(f(g(\tilde{x})))' = \tilde{g}'(\tilde{x}) f(g(\tilde{x}))\]
Corollary 6 (Vector chain rule for quadratic forms) \[\frac{\partial}{\partial \tilde{\beta}}{{\left(\tilde{\varepsilon}(\tilde{\beta})\cdot \tilde{\varepsilon}(\tilde{\beta})\right)}} = {\left(\frac{\partial}{\partial \tilde{\beta}}\tilde{\varepsilon}(\tilde{\beta})\right)} {\left(2 \tilde{\varepsilon}(\tilde{\beta})\right)}\]